Evaluation of Partitional Algorithms for Clustering Medical Documents

نویسندگان

  • Omer I. E. Mohamed
  • F. H. Saad
  • Elfadil A. Mohamed
چکیده

There are large quantities of information about patients and their medical conditions. The discovery of trends and patterns hidden within the data could significantly enhance understanding of disease and medicine progression and management by evaluating stored medical documents. Methods are needed to facilitate discovering the trends and patterns within such large quantities of medical documents. Clustering medical documents into small number of meaningful clusters is one of these methods; because dealing with only the cluster that will contain relevant documents should improve effectiveness and efficiency. The produced clusters must be in high-quality because it will be used for further processing to discover the hidden trends and patterns. The focus of this paper is to experimentally evaluate the clusters’ quality of partitional clustering algorithms that use different criterion functions in the context of clustering medical documents. Our experimental results show that E1 leads to the best solution using repeated bisection as clustering method in term entropy. And I1 is the best using direct clustering methods in term of both entropy and purity. Keywords—Medical Documents, Partitional Algorithms, Document Clustering, Direct Clustering Method, Clusters’ Quality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pre Processing Techniques for Arabic Documents Clustering

Clustering of text documents is an important technique for documents retrieval. It aims to organize documents into meaningful groups or clusters. Preprocessing text plays a main role in enhancing clustering process of Arabic documents. This research examines and compares text preprocessing techniques in Arabic document clustering. It also studies effectiveness of text preprocessing techniques: ...

متن کامل

A Particle Swarm Optimization based fuzzy c means approach for efficient web document clustering

There is a need to organize a large set of documents into categories through clustering so as to facilitate searching and finding the relevant information on the web with large number of documents becomes easier and quicker. Hence we need more efficient clustering algorithms for organizing documents. Clustering on large text dataset can be effectively done using partitional clustering algorithm...

متن کامل

Experimental Estimation of Number of Clusters Based on Cluster Quality

Text Clustering is a text mining technique which divides the given set of text documents into significant clusters. It is used for organizing a huge number of text documents into a well-organized form. In the majority of the clustering algorithms, the number of clusters must be specified apriori, which is a drawback of these algorithms. The aim of this paper is to show experimentally how to det...

متن کامل

Comparison of Agglomerative and Partitional Document Clustering Algorithms

Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation and browsing mechanisms by organizing large amounts of information into a small number of meaningful clusters, and in greatly improving the retrieval performance either via cluster-driven dimensionality reduction, term-weighting, or query expansion. This ever-increasing importance of do...

متن کامل

A Comparison of Two Document Clustering Approaches for Clustering Medical Documents

form of medical reports. Such documents contain important information about patients, disease progression and management, but are difficult to analyse with conventional data mining techniques due to their unstructured nature. Clustering the medical documents into small number of meaningful clusters may facilitate discovering patterns by allowing us to extract a number of relevant features from ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012